Morphological annotation of Korean with Directly Maintainable Resources
نویسندگان
چکیده
This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89% recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of words and without applying morphological rules at annotation time, which speeds up annotation to 1,210 word/s. The lexicon of words is obtained from the maintainable language resources through a fully automated compilation process.
منابع مشابه
A Resource-based Korean Morphological Annotation System
We describe a resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. The output of our system is a graph of morphemes annotated with accurate linguistic information. The language resources used by the system can be easily updated, which allows users to control the evolution of the performances of the system. We show that morphological anno...
متن کاملPenn Korean Treebank : Development and Evaluation
With growing interest in Korean language processing, numerous natural languages processing (NLP) tools for Korean, such as part-of-speech (POs) taggers, morphological analyzers , parsers, have been developed. This progress was possible through the availability of large-scale raw text corpora and POS tagged corpora (ETRI, 1999; Yoon and Choi, 1999a; Yoon and Choi, 1999b). However, no large-scale...
متن کاملSegmentation Granularity in Dependency Representations for Korean
Previous work on Korean language processing has proposed different basic segmentation units. This paper explores different possible dependency representations for Korean using different levels of segmentation granularity — that is, different schemes for morphological segmentation of tokens into syntactic words. We provide a new Universal Dependencies (UD)-like corpus based on different levels o...
متن کاملThe Comparative Impact of Pictorial Annotations and Morphological Instruction on Lexical Inferencing of Iranian Intermediate EFL Learners
One of the main ways to acquire unfamiliar words is to make guesses about words meaning. This study investigates the comparative effects of pictorial annotations and morphological instructions on Iranian EFL learners’ lexical inferencing ability. Considering homogeneity issues using PET (Preliminary English Test), the researchers assigned the participants into two experimental and one control g...
متن کاملA Tool for Korean Semantic Annotated Corpus Construction
Despite that the semantic annotated corpus is necessary in semantic role labeling, there is no semantic annotated corpus constructed for Korean. This paper establishes a tool for the construction of the Korean semantic annotated corpus including Korean Proposition Bank (PropBank). Sejong predicate case frame dictionary was used as one of the linguistic resources, and a Korean syntactic annotate...
متن کامل